Inferring missing links or detecting spurious ones based on observed graphs, known as link prediction, is a long-standing challenge in graph data analysis. With the recent advances in deep learning, graph neural networks have been used for link prediction and have achieved state-of-the-art performance. Nevertheless, existing methods developed for this purpose are typically discriminative, computing features of local subgraphs around two neighboring nodes and predicting potential links between them from the perspective of subgraph classification. In this formalism, the selection of enclosing subgraphs and heuristic structural features for subgraph classification significantly affects the performance of the methods. To overcome this limitation, this paper proposes a novel and radically different link prediction algorithm based on the network reconstruction theory, called GraphLP. Instead of sampling positive and negative links and heuristically computing the features of their enclosing subgraphs, GraphLP utilizes the feature learning ability of deep-learning models to automatically extract the structural patterns of graphs for link prediction under the assumption that real-world graphs are not locally isolated. Moreover, GraphLP explores high-order connectivity patterns to utilize the hierarchical organizational structures of graphs for link prediction. Our experimental results on all common benchmark datasets from different applications demonstrate that the proposed method consistently outperforms other state-of-the-art methods. Unlike the discriminative neural network models used for link prediction, GraphLP is generative, which provides a new paradigm for neural-network-based link prediction.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Recently, segmentation-based methods are quite popular in scene text detection, which mainly contain two steps: text kernel segmentation and expansion. However, the segmentation process only considers each pixel independently, and the expansion process is difficult to achieve a favorable accuracy-speed trade-off. In this paper, we propose a Context-aware and Boundary-guided Network (CBN) to tackle these problems. In CBN, a basic text detector is firstly used to predict initial segmentation results. Then, we propose a context-aware module to enhance text kernel feature representations, which considers both global and local contexts. Finally, we introduce a boundary-guided module to expand enhanced text kernels adaptively with only the pixels on the contours, which not only obtains accurate text boundaries but also keeps high speed, especially on high-resolution output maps. In particular, with a lightweight backbone, the basic detector equipped with our proposed CBN achieves state-of-the-art results on several popular benchmarks, and our proposed CBN can be plugged into several segmentation-based methods. Code will be available on https://github.com/XiiZhao/cbn.pytorch.
translated by 谷歌翻译
场景图生成(SGG)任务旨在在给定图像中检测所有对象及其成对的视觉关系。尽管SGG在过去几年中取得了显着的进展,但几乎所有现有的SGG模型都遵循相同的训练范式:他们将SGG中的对象和谓词分类视为单标签分类问题,而地面真实性是一个hot目标。标签。但是,这种普遍的训练范式忽略了当前SGG数据集的两个特征:1)对于正样本,某些特定的主题对象实例可能具有多个合理的谓词。 2)对于负样本,有许多缺失的注释。不管这两个特征如何,SGG模型都很容易被混淆并做出错误的预测。为此,我们为无偏SGG提出了一种新颖的模型不合命相的标签语义知识蒸馏(LS-KD)。具体而言,LS-KD通过将预测的标签语义分布(LSD)与其原始的单热目标标签融合来动态生成每个主题对象实例的软标签。 LSD反映了此实例和多个谓词类别之间的相关性。同时,我们提出了两种不同的策略来预测LSD:迭代自我KD和同步自我KD。大量的消融和对三项SGG任务的结果证明了我们所提出的LS-KD的优势和普遍性,这些LS-KD可以始终如一地实现不同谓词类别之间的不错的权衡绩效。
translated by 谷歌翻译
文本对象的重新识别(REID)旨在通过文本描述搜索感兴趣的身份的行人图像。由于丰富的模式内变化和明显的模式间差异,这是具有挑战性的。现有作品通常忽略两种方式之间的特征粒度差异,即,视觉特征通常是细粒度的,而文本特征则粗糙,这主要负责大型模式间间隙。在本文中,我们提出了一个基于变形金刚的端到端框架,以学习两种模式的粒度统一表示,称为LGUR。 LGUR框架包含两个模块:基于字典的粒度比对(DGA)模块和基于原型的粒度统一(PGU)模块。在DGA中,为了使两种模式的粒度对齐,我们引入了一个多模式共享词典(MSD)以重建视觉和文本特征。此外,DGA还具有两个重要因素,即跨模式指导和以前景为中心的重建,以促进MSD的优化。在PGU中,我们采用一组共享和可学习的原型作为查询,以提取粒度统一特征空间中这两种方式的多样化和语义对齐特征,从而进一步促进了REID的性能。综合实验表明,我们的LGUR在Cuhk-Pedes和ICFG-Pedes数据集上始终以大幅度的优势优于最先进的东西。代码将在https://github.com/zhiyinshao-h/lgur上发布。
translated by 谷歌翻译
我们提出了FITE,这是一种对服装中的人体化身进行建模的第一刻度框架。我们的框架首先学习了代表粗衣拓扑的隐式表面模板,然后采用模板来指导点集的产生,从而进一步捕获姿势依赖的服装变形,例如皱纹。我们的管道结合了隐式和明确表示的优点,即处理变化拓扑的能力以及有效捕获细节的能力。我们还提出了扩散的皮肤,以促进模板训练,尤其是用于宽松衣服的模板训练,以及基于投影的姿势编码,以从网格模板中提取姿势信息,而无需预定义的紫外线图或连接性。我们的代码可在https://github.com/jsnln/fite上公开获取。
translated by 谷歌翻译
在各种设备上部署深度学习模型已成为一个重要的话题。硬件专业化的浪潮为多维张量计算带来了一套多样化的加速度原始图。这些新的加速原始基原料以及新兴的机器学习模型带来了巨大的工程挑战。在本文中,我们提出了Tensorir,这是一种编译器抽象,用于通过这些张量计算原始素优化程序。Tensorir概括了现有机器学习编译器中使用的循环巢表示,以将张量计算作为一流的公民。最后,我们在抽象之上构建了一个端到端框架,以自动优化给定的张量计算原始图的深度学习模型。实验结果表明,Tensorir编译会自动使用给定硬件后端的张量计算原始图,并提供与跨平台的最新手工精制系统竞争性能的性能。
translated by 谷歌翻译
最近出现了有希望的表现,利用大型预训练的模型来实现各种感兴趣的下游任务。由于模型的规模不断增长,因此,在模型培训和存储方面,基于标准的完整任务适应策略的成本高昂。这导致了参数有效传输学习的新研究方向。但是,现有的尝试通常集中在预训练模型的相同模式(例如图像理解)的下游任务上。这会产生限制,因为在某些特定的方式(例如,视频理解)中,具有足够知识的强大预训练模型较少或不可用。在这项工作中,我们研究了这样一种新型的跨模式转移学习设置,即参数有效的图像到视频传输学习。为了解决此问题,我们为每个视频任务提出了一个新的时空适配器(ST-ADAPTER),以进行参数有效调整。凭借紧凑设计中的内置时空推理能力,ST-ADAPTER可以实现预训练的图像模型,而无需时间知识,以小(〜8%)的每任务参数成本来理解动态视频内容,以大约需要与以前的工作相比,更新参数少20倍。在视频动作识别任务上进行的广泛实验表明,我们的ST-ADAPTER可以匹配甚至优于强大的完整微调策略和最先进的视频模型,同时享受参数效率的优势。
translated by 谷歌翻译
部件组件是机器人中的典型但具有挑战性的任务,机器人将一组各个部件组装成完整的形状。在本文中,我们开发了用于家具组件的机器人组装仿真环境。我们将零件装配任务制定为混凝土加固学习问题,并提出了一种机器人的管道,以学习组装多种椅子。实验表明,当使用看不见的椅子进行测试时,我们的方法在以上对象的环境下实现了74.5%的成功率,并在完整环境下实现了50.0%。我们采用RRT-CONNECT算法作为基线,在计算时间明显更长的时间后,只能实现18.8%的成功率。我们的项目网页提供了补充材料和视频。
translated by 谷歌翻译
前列腺癌是美国男人的第二致致命癌症。虽然磁共振成像(MRI)越来越多地用于引导前列腺癌诊断的靶向活组织检查,但其效用仍然受到限制,因为假阳性和假否定的高率以及较低的读者协议。机器学习方法在前列腺MRI上检测和定位癌症可以帮助标准化放射科学诠释。然而,现有的机器学习方法不仅在模型架构中不等,而且还可以在用于模型培训的地面真理标签策略中。在这项研究中,我们比较不同的标记策略,即病理证实放射科标签,整个安装组织病理学图像上的病理学家标签,以及病变水平和像素级数字病理学家标签(先前验证了组织病理学图像上的深层学习算法以预测像素 - 整个安装组织病理学图像上的Gleason模式)。我们分析这些标签对训练有素的机器学习模型的性能的影响。我们的实验表明,用它们培训的(1)放射科标签和模型可能会错过癌症,或低估癌症程度,(2)与他们培训的数字病理学家标签和模型与病理学家标签有高度的一致性,而(3)用数字病理学家培训的模型标签在两种不同疾病分布的两种不同群组中达到最佳性能,而不管使用的模型建筑如何。数字病理学家标签可以减少与人类注释相关的挑战,包括劳动力,时间,和读者间变异性,并且可以通过使可靠的机器学习模型进行培训来检测和定位前列腺癌,帮助弥合前列腺放射学和病理学之间的差距在MRI。
translated by 谷歌翻译